Not All Contexts Are Created Equal:Better Word Representations with Variable Attention

2017-08-27

Abstract

They introduce an extension to the bag-of-words model for learning words represen- tations that take into account both syntactic and semantic properties within language.

Introduction

For BOW and CBOW models, the problem is that they are lack of sensitivity of the word order which limits their ability of learn syntactically motivated embeddings.

In this work, they propose an extension to the continuous bag-of-words model, which adds an attention model that considers contextual words differently depending on the word type and its relative position to the predicted word (distance to the left/right).

For instance, in the sentence We won the game! Nicely played!,

the prediction of the word played, depend on both the syntactic relation from nicely, which narrows down the list of candidates to verbs,
and on the semantic relation from game, which narrows down the list of candidates to verbs related to games.
On the other hand, the words we and the add very little to this particular prediction.
On the other hand, the word the is important for predicting the word game, since it is generally followed by nouns.

Attention-Based Continuous Bag-of-words

CBOW with attention

It has a addition weight matrix $K \in \mathbb{R}^{|V| \times 2b}$.This is a set of parameters that determines the importance of each word type in each (relative) position.

Experiments

The model outperforms other methods in part-of-speech induction. It performs similar with the structured skip-ngram model on part-of-speech tagging. While training this model is faster.

They also use the Movie Review dataset to evaluate the model. But their model do not perform as well as the CBOW and Skipngram. They said it is because their model learn more towards syntax.

Blog

Papers